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Abstract 

This article considers unbiased estimation of mean, variance and sensitivity level of a sensitive variable via scrambled 
response modeling. In particular, we focus on estimation of the mean. The idea of using additive and subtractive scrambling 
has been suggested under a recent scrambled response model. Whether it is estimation of mean, variance or sensitivity 
level, the proposed scheme of estimation is shown relatively more efficient than that recent model. As far as the estimation 
of mean is concerned, the proposed estimators perform relatively better than the estimators based on recent additive 
scrambling models. Relative efficiency comparisons are also made in order to highlight the performance of proposed 
estimators under suggested scrambling technique. 
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Introduction 

To procure reliable data on stigmatizing characteristics, Warner 
[1] introduced the notion of randomized response technique 
where the respondent himself selects randomly one of the two 
complementary questions on probability basis. Greenberg et al. [2] 
extended the Warner's [1] work to collect the data on quantitative 
stigmatizing variables. Since then, several authors have worked on 
quantitative randomized response models including, Eichhorn and 
Havre [3], Gupta and Shabbir [4], Gupta and Shabbir [5], Bar- 
Lev et al. [6], Gupta et al. [7], Hussain and Shabbir [8], Saha [9], 
Chaudhuri [10], Hussain and Shabbir [1 1] and references therein. 
Quantitative randomized response models are classified into fully 
(Eichhorn and Hayre [3]), partial (Gupta and Shabbir [5]), Bar- 
Lev et al. [6]) and optional randomized response models (Gupta et 
al. [4]), Gupta et al. [7], Huang [12]). In a fully randomized 
response models all the responses are obtained as scrambled 
responses. In a partial randomized response model a known 
proportion of respondents is asked to report their actual responses 
while the others report scrambled responses. 

Our focus in this article is on ORRMs only. The notion of 
ORRM started with Gupta et al. [4] . The concept of ORRM is 
based on the respondent's perception about sensitivity of the 
variable of interest. Using ORRM, a respondent can report the 
truth (or scramble his/her response) if he/she perceives the study 
variable as non sensitive (sensitive) to him/her. The proportion of 
respondents reporting the scrambled response is unknown, and is 
termed as the sensitivity level of the study variable. Gupta et al. [4] 
used multiplicative ORRM and provided unbiased (biased) 
estimator of mean (sensitivity). Moreover, Gupta et al. [4] ORRM 
requires approximation in order to derive the variances of the 
estimators. In Gupta et al. [4] ORRM, simultaneous estimation of 



mean and sensitivity is not possible. To avoid approximation, 
Gupta et al. [7], Huang [12], Gupta et al. [13] and Mehta et al. 
[14] proposed ORRMs to provide unbiased estimators of mean 
and sensitivity level. Gupta et al. [7] and Huang [12] are the one- 
stage ORRMs, Gupta et al. [13] is a two-stage ORRM whereas 
Mehta et al. [14] is a three-stage ORRM. Gupta et al. [7], Gupta 
et al. [13] and Mehta et al. [14] used additive scrambling whereas 
Huang [12] used a linear combination of additive and multipli- 
cative scrambling. Further, Gupta et al. [15] observed that 
additive scrambling yields more precise estimators than a linear 
combination of additive and multiplicative scrambling by Huang 
[12]. Also, Gupta et al. [16] observed that in Gupta et al. [13] two- 
stage ORRM a large value of truth parameter (7j is required when 
the study variable is highly sensitive. Motivated by the advocacy of 
additive scrambling and requirement of larger value of truth 
parameter (7), Mehta et al. [14] proposed a three stage ORRM by 
introducing a forced scrambling parameter (F). Mehta et al. [14] 
established the better performance of estimator of mean but did 
not discuss the performance of sensitivity estimator. As far as the 
estimation of mean is concerned, Mehta et al. [14] ORRM can be 
further improved by using a multi-stage randomization but it 
results in a poor estimation of sensitivity level. 

All of the ORRMs mentioned above share a common feature of 
splitting the total sample into two subsamples. We base our 
proposals on two strategies: (i) taking two subsamples and making 
use of additive scrambling in one subsample and subtractive 
scrambling in the other, and (ii) drawing a single sample and 
collecting two responses from each respondent through additive 
and subtractive scrambling. Through our strategies, we plan to 
improve Mehta et al. [14] ORRM for estimating the mean. As far 
as estimation of mean is concerned, we show that the proposed 
ORRM is better than Mehta et al. [14], Huang [12] and Gupta et 
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al. [13] ORRMs. We show that there is no need of large value of 
the parameter (T or F) when the study variable is either low, 
moderately or highly sensitive. In addition, we also propose an 
estimator of the variance of the study variable. 

We now briefly discuss three of the background ORRMs, 
namely, the Mehta et al. [14], Huang [12] and Gupta et al. [13]. 



o 2 z =o 2 x + {F+{\-T-F)W){\-(F+(\~T-F)W)}e 2 
+ {F+(l-T~F)W)S 2 , (/= 1,2). 



(6) 



Mehta et al. [14] ORRM 

Assume that the interest lies in unbiased estimation of the mean 
fi x and the sensitivity level W of the study variable X. Let 
Di,(i =1,2) be the unrelated scrambling variable. Two indepen- 
dent subsamples of size n, (/= 1,2), are drawn from the population 
through simple random sampling with replacement such that 
n\ +«2 =n, the total sample size required. In i' h subsample, a fixed 
predetermined proportion (7") of respondents is instructed to tell 
the truth and a fixed predetermined proportion (F) of respondents 
is instructed to scramble additively their response as (X + Di). The 
remaining proportion ( 1 — T — F) of respondents have an option 
to scramble their response additively if they consider the study 
variable sensitive. Otherwise, they can report the true response X. 
Let n D .=6i, be the known mean, and rj|, . =<5?, be the known 
variance of the positive-valued random variable Dj(i= 1,2). The 
optional randomized response from j' h respondent in the i th 
subsample is given by: 



'/■H = + fij {Xj + Ay) + ( 1 - «/ - (ij) 
{(1- Yj)Xj+Yj(Xj-D v )}, 



(1) 



where i= 1,2. J= 1,2, ...,«,, Yj ~ Bernoulli(W), a, ~ Bernoulli(T) 
and ftj ~ Bernoulli(F) . The expectation of the sample response Zy 
from /** sample is given by: 



Gupta et al. [13] ORRM 

It is interesting to note that for .F = 0, the Mehta et al. [14] 
ORRM reduces to Gupta et al. [13] ORRM. . Let Z'y be the 
optional scrambled response from f h respondent in the i th 
subsample then taking F = Q in (l)-(5), unbiased estimators and 
their variances are given by: 



eiZi-B 2 Z\ 

(01-02) 



1 / „ °7' , oi 

Var{p. XG ) = ? \6\ -^i +8] 



Var{W G 



2 2 
z 1 _|_ Z 2 

(l-r) 2 (fJi-0 2 ) 2 1 «i "2 



(7) 



(8) 



(9) 



(10) 



where 



E(Z ij )=n x + (F+(l-T-F)W)e i . 



a 2 z ,=<j 2 x +W(\-T){\-W(\-T)}e 2 i + W{\~T^ 



Taking Z\ and Z 2 as the observed means from the two 
subsamples, Mehta et al. [14] proposed the following estimators 
of ji x and W. 



W M - 



1 



J\Z. 2 — V2^\ 

(0i-e 2 ) 



,e^e 2 



(2) 



(l-T-F) V(02-0i) 
The variances of estimators in (2) and (3) are given by: 



Var(fi s 



Var(W M )- 



f ) , r+f #i,0i#02. (3) 



(4) 



1 / , ci ,si 

(6»l-02) 2 1 "1 "2 



(l-r-^) 2 



Ml « 2 



where 



(5) 



Huang [12] ORRM 

Each respondent in the X th subsample is provided with two 
randomization devices which generate two independent random 
variables, say Si and D,, from some pre-assigned distributions. 
The respondent chooses randomly by himself one of the following 
two options: (a) report the true response X (if you do not feel the 
study variable sensitive), or (b) report the scrambled response 
SiX + Dj (if you feel the study variable sensitive). Let fi s . = 1, be 
the known mean, and a 2 s .=y 2 , be the known variance of the 
positive-valued random variables S,. The optional randomized 
response Z'y from j th respondent in the i ,h subsample is given by: 



^ = (1- Y^Xj+Y^SijXj+Dy 



(11) 



The expectation of sample response Z"y from i ,h sample is given 
by: 

E(zjj) = (i- w)fi x + w{n x + e t } =nx+ we,, 

since fi s . = 1. Huang [2] proposed the following estimators of fi x 
and W.' 
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O1Z2 — 02^1 _ _ 



(01-02 



(01-02) T 



(12) 



(13) 



where Z'{ and Z'2 are the observed means from the two 
subsamples. The variances of estimators in (12) and (13) are given 
by: 



Var(fi x 



1 



2 2 

l2 z l . 0 2 z 2 



(0i -e 2 ) 2 1 2 "»i 



+0 



«2 



Var(W H )-- 



1 



(01-02) V" 1 



"2 



(14) 



(15) 



where 



(1- 



J ( EjR^-EjRy) 

■T-F)\ (0i+0 2 ) 



Estimating E(Ry) and E[R2j) by the respective sample means i^i 
and -R2, unbiased estimators of fi x and H 7 are proposed as: 



Vxz = 



1R2 + O2-R1 

(01+02) 



Z (l-T-F)\(6 1+ 6 2 ) 



(20) 



(21) 



Unbiasedness of ji xz an d JPz can be easily established through 
(18) and (19). The variances of fl xz and W z are given by : 



Var{ji x 



1 



(0i+0 2 r 



2 2 

Ml M 2 



(22) 



4 = 4 +W(n 2 x + a\)y) +W(l-W)(j>+ Wb). 



Proposed Procedures 

In this section, we propose split sample and double response 
approaches using Mehta et al. [14] ORRM. 

Split sample approach 

Unlike Mehta et al. [14], in the proposed procedure, we use an 
additive scrambling in one subsample and subtractive scrambling 
in the other. All the other procedure is same as that of Mehta et al. 
[14]. Let Ry and be response from j' h (J= 1,2,..., «,■) 

respondent selected in the i th {i= 1,2) sample, then Ry and Rij 
can be written as: 



Ry = + (h {Xj + Dy) + (1 - «, - Hj) 
{(1- Y^Xj+YjiXj+Dy)}. 



R 2J = ayX } + llj (Xj -D 2J ) + (l- y.j - /?,) 



(16) 



(17) 



{(1- Yj)X J +Y J {X j ~D 2j )}. 
The expected responses from the two subsamples are given by: 

E(Ry)= l i x + (F+(l-T-F)W)9 l . (18) 

E(Ry)=ti x -(F+(l-T-F)W)9 2 . (19) 
Solving (18) and (19), we get: 



Var(W z ) = 



(1- 



-T-F) 2 {6 x +6 2 f 



■ + ■ 



"2 



(23) 



where 4- =f7 z • 

It is important to note that subtractive scrambling in the second 
subsample is same as the additive scrambling if — D 2 is viewed as 
the new scrambling variable. We anticipate two advantages by 
calling it subtractive scrambling. Firstly, it is easier just to subtract 
a constant (randomly chosen by the respondent) from the actual 
response on sensitive variable. Second advantage is a psychological 
one in nature. Perhaps, due to social desirability, a typical 
respondent would like to report smaller response in magnitude. In 
other words, respondents would be happy in underreporting, in 
general. Thus, subtracting a positive constant from the actual 
response would help satisfying the social desirability of underre- 
porting. Of course, these two advantages are gained in the second 
subsample only since D\ and D 2 art positive valued random 
variables. On average, affect of additive scrambling in one 
subsample is offset by subtractive scrambling in the other. As a 
result, parameters are estimated with increased precision. 

Theorem 2.1: For T + F<\, fi xz ~N(fi x ,Var(fi xz )) and 

W z ~N(W,Var(W z )). 

Proof: Since \l xz and W z are the linear combinations of 
sample means, application of central limit theorem gives the 
required result. 

1 2 

In view of the fact that s^. = («; — 1)~ Yl (Rjj — Rj) is an 

7=1 

unbiased estimator of a\, we have the following theorems. 

Theorem 2.2: An unbiased estimator of Var(fi xz ) is given by: 



Var{{i x 



1 



(01+02) 2 



,2^1 +Q 2 S R 2 



n 2 



t*xz z 



i E{R2,)+e 2 E{Ry) 

(01+02) 



Theorem 2.3: An unbiased estimator of the Var(W z ) is 
given by: 
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Var(Wz) = — 2 (j^ + S J^\. 

(6»i —e» 2 ) 2 »2 J 

Proofs: The proofs of the above Theorems (2.2 and 2.3) can 
easily be provided by utilizing the fact that E^s 2 R ^j = fJ \ l - 

Theorem 2.4: An unbiased estimator of Var{ Y) is given by: 

Var(Y) = W z (l-W z ) + Var{W z ) . 

Proof: Applying the expectation operator at Var( Y), we get: 

E{Var{Y)} =E(W Z ) -E(W Z ) +E{Var(W z )}. 
Then, applying Theorem 2.3, we get: 

E{ Var{Y)}=E{W z ) -E(W 2 ) + Var{W z ) 
E{ Var{ Y)}=E{W Z )- \var{W z ) + {E(W Z ) } 2 ] + Var{W z ) 

E{Var{ Y)} = W-W 2 =W(\-W). 

Now, we consider the estimation of variance <7 X of the sensitive 
variable X . Provided that 3 2 — b\ ^ 0, from (6) we can, after a 
simple algebra, write that 

2 J^\-^\ 2 -{e 2 A-el&\)A(\-A) 
° x ' ~ (sl-sl) 

A = {F+(l-T-F)W z }. 

We define unbiased estimators of a 2 x in the following theorems. 

Theorem 2.5: In case when 5\ — d\=£ 0, an unbiased estimator 
of a 2 x is given by: 

4z=^4, -<5?4 2 - {5\e\-5\e 2 2 ){F(\ -f) + 

(\-2F)(\-T-F)W-(\-T-F) 2 (24) 
(W-Var(W))}/{5l-dl). 



Double response approach 

Without incurring any additional sampling cost, Mehta et al. 
[14] ORRM may also be improved by taking two responses from 
each respondent. We take scrambling variables the same as 
defined in Mehta et al. [14] ORRM. To report the first (second) 
response, respondents are requested to use additive (subtractive) 
scrambling with the variable Z>i(Z>2). Let Ry and R 2 j be the two 

responses of j' h respondent then the two responses can be written 
as 

Kj = y-jXj + Pj {Xj + Dy) + (1 - CCj- Pj) 
{(1- YjjXj+YjiXj+D^} 

Ry = + Pj {Xj -D 2J ) + (l- a, - 
{(1- Yj)X j+ Yj(Xj-D 2 j)}. 

It is obvious from (26) and (27) that the true value of sensitive 
variable Xj cannot be worked out for the respondents feeling study 
variable sensitive enough. The reported responses of a particular 
respondent would be same if he/she feels study variable 
insensitive. In this case, he/she reports true value of study variable 
both the times. This is not challenging since the respondents 
feeling study variable insensitive would be willing to dispose their 
true value on sensitive variable. Thus, it may be concluded that 
privacy of respondents, feeling study variable sensitive, remains 
intact. As correctly pointed out by one of the referees, there is 
extra burden on the respondent if he/she has to report twice. This 
issue may be tackled by explaining whole the procedures to the 
respondent before actually obtaining data. He/she must be 
assured that his/her actual response on sensitive variable cannot 
be traced back to his/her actual response. Further he/ she must be 
made clear that interest of the study lies in the estimation of 
parameters only. Moreover, we do not need any additional 
sampling cost to obtain two responses. Thus, obtaining two 
responses from a respondent should not be an issue in a particular 
study. 

The expected responses from the f h respondent are same as 
given by (18) and (19). Thus E (ti^ = E (Ry) and e(r 2/ ^J = 

E^Ry). This implies that unbiased estimators of ji x an d W may 
be suggested as: 

(0.4-02) ■ (28) 



Theorem 2.6: In case when S 2 — S t = 0, an unbiased estimator 
of a 2 x is given by: 

where f! is known constant belonging to the interval [0,1], 
< j2 Xi = s x- + ^j8 2 and A = {F+(l-T-F)W z }. 

Proofs: The above Theorems 2.5 and 2.6 can be proved by 
noting that W z and yW z —Var\Wz)j are unbiased estimators 
of W and W 2 respectively. Taking expectation of (24) and (25), we 
gaE(a 2 xz )=cj 2 x . 



The variances of p, xz and W z are given by : 

VarU xz ) = 1 —\ el^+6 2 ^ 2 H , (30) 

V ' (6>i + 6» 2 ) 2 l 2 n 1 n n J 
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Var(w' z 



[\-T-F) 2 (6 x + 6 2 ) 2 



2Cov[R,,R 



where 

Cov^X) =^(^1^2) ~ E ( R 'i) E ( R 2) 

Cov(ji' 1 X)=4-{^+(i-r-J0} 

[1-{F+(1-T-F)}]0i02. 

In some studies, interest of researchers lies in estimating 
/^rather than the sensitivity level W of variable X while it is of 
major interest in other studies. Following Huang [12], we define a 
linear combination of Var(p, xz ) and Var(W z ) in order to find 
the optimum allocation of sample size. Thus, depending upon the 
interest of researchers, optimum subsample sizes can be obtained. 

Consider, 

Var{pL xz , W z ) = kVar{fi xz ) + (1 -k) Var{ W z ) , ke[0,l}. 

Using Lagrange approach to minimize Var(p, xz ,Wz) under the 
2 

restriction that Yl n i = n > we g et: 



«l =n 



^ 2 Rl {k(e 2 2 -i) + i} 



^\ {k(9l - 1) + 1} + ^i 2 {Hoi - 0 + 1} ' 



and 



Privacy Protection Discussion 

There are many privacy measures suggested by different 
authors. We take E{Zj — Xj) 2 as the measure of privacy. This 
measure of privacy is proposed by Zaizai et al. [18] . A given model 
is taken as more protective against privacy if E(Z j — X i ) 2 is higher. 
For a model providing privacy protection to some extent 
E(Z, — Xj) 2 >0. On the other hand, if a model does not provide 
any privacy E(Z i — X i ) 2 =0. For a given model, the larger the 
E(Z j — Xj) 2 , the larger the privacy provided by the model. 

The measures of privacy for Mehta et al. [14] ORRM are given 
by W(\-T-F)(0 2 + 6 2 ) and W{\ -T -F)(9 2 2 + S 2 2 ) in the first 
and second subsamples, respectively. Similarly for Gupta et al. 
[13] model it is W(\-T){6\ + &\) in the first sample, and 
W{ 1 — T) (6\ + 8 2 ) in the second sample. This shows that, in both 
the subsamples, Gupta et al. [13] ORRM is more protective 
compared to Mehta et al. [14] ORRM. The measures of privacy 
for Huang [12] ORRM are given by (/i 2 x + a 2 x ) Wy 2 + 
W(9 2 + S 2 ) and {n 2 x + <y 2 x )Wy 2 1 +W(e 2 2 + 5 2 1 ) in the first and 
second subsamples, respectively. The measures of privacy for the 
proposed estimator in split sample approach are the same as that 
of Mehta et al. [14] ORRM. In double response approach the 



measure of privacy is given by 



W{\-T-F) 



f x +e\+d\+d 2 r 



29\8 2 ) which is equal to measure of privacy provided by Mehta et 
al. [14] ORRM if and only if 3(t?f + ^ 



(6 2 + S 2 - 



■26,0 



lU 2 or 



3E(D 2 ) = {E(Dj)-2E(D i )E(D 2 )}. This shows that the pro- 
posed double response approach may be made more protective 
compared to Mehta et al. [14] ORRM at the cost of increased 
variance. In fact, it is a trade-off between the efficiency and privacy 
protection. That is, we can have highly efficient estimator by 
compromising on privacy. Similarly, we can build a more 
protective model by compromising on the efficiency. 

Efficiency Comparison 

We compare the proposed split sample and double response 
approaches with the Mehta et al. [14], Huang [12] and Gupta et 
al. [13] ORRMs in terms of relative efficiency. 



n 2 =n 



-!) + !} 

^ {k(8 2 2 - 1) + 1 } + ^ {^(^-1) + !}' 



With these optimum sample sizes, the minimum value of 
Var(ji xz ,W z ) is given by: 



Min. Var(ji xz , Wz) = 



In practice, a\_ is unknown and the optimum allocation of 
sample sizes cannot be made. Following Murthy [17], the 
unknown values of a\ can be estimated from pilot surveys, past 
experience or simply an intelligent guess can be made about a\.. 



(i) [i xz versus fi XM and W z versus Wm 

The proposed estimators jj, xz and Wz are relatively more 
efficient than the corresponding estimators fl X M an d Wm of 
Mehta et al. [1] if Var(jl XM )>Var(jx X z) and Var{W 'm)> 
Var(W z ). Since o\=a 2 z ., from (4), (5), (20) and (21), it is easy to 
show that fl xz and Wz are relatively more efficient than fi XM and 
II » if 



(0 2 + 0,) 2 



>1, 



(0 2 -0,) 2 

which is always true for every value of 6\ and $2- 

(ii) fi xz versus fi XG and fi XH 

The proposed estimator jj, xz is relatively more efficient than 
P-xg and fi XH if Var(p XG ) > Var(p xz ) and Var(fi XH )> 
Var{ji xz ). From (9), (14) and (21), we see that it is difficult to 
derive the efficiency conditions for fi xz - We calculated the relative 
efficiency numerically through simulations by defining RE\ = 

^4Hand^ 2 =^4H 
Var(n xz ) Var(n xz ) 



For a simulation study, we fixed 
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7-0.14 . F-0.14 



7-0.28 . F-0.42 



7-0.42 .F-0.42 



1 1 1 1 — 

0.0 0.2 04 0.6 0.8 



7 = 0.56 , F = 0 42 



0.0 0.2 0.4 0.6 







RE; 










^ — — "~ 














I I I 

0.0 0.2 0-4 


I I I 

0.6 0 8 1.0 


W 

(b) 




7 = 0.7 ,F = 


0 23 


RE, 




RE; 












0.0 0.2 0.4 


0.6 0.3 1.0 


w 

(e) 





~1 1 1 1 1 — 

0.0 0.2 0.4 0.6 0.8 



7 - 0 84 , F-0.14 



0.2 0.4 0.6 



Figure 1. RE, and RE 2 for {n\,a\) ={1,1), (0?,0|) = (2,3), (<5j,«Sf) = (1,1) and (yf.yf) = (2,3). 
doi:10.1371/journal.pone.0083557.g001 



7-0.14 .F-0.14 



7-0.28 , F-0.42 



7-0 42 .F-0.42 



RE, 



0.0 02 



— I 1 1 [~ 

04 06 0 8 1.0 



06 08 10 



~i 1 1 1 \ r 

0 0 0 2 0 4 0 6 0 8 1.0 



7-0 56.F-042 



7-07 , F-028 



7 - 0 84 F-0 14 



0.0 0.2 0.4 0.6 0 8 1.0 



0.6 0 8 1.0 



c : 0.4 0.6 0.8 1J) 



Figure 2. RE, and RE 2 for (fi^.a^) = (1,2), {0\,O\) = (2,3), {d\,S\) =(1,1) and (j^,^) = (2,3). 
doi:1 0.1 371 /journal.pone.0083557.g002 
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7-0.14 . F-0.14 



7-0.28 . F-0.42 



7-0 42 . F-0.42 



7". 0 56 F-042 



7-07 .F-028 



7 - 0 84 F-0 14 



= (1,1) and (y?,yf)=(2,3). 



Figure 3. RE, and RE 2 for {fi 2 x ,a 2 x ) = (1,2), (0?,flf) = (2,3), {d\,d\) 
doi:1 0.1 371 /journal.pone.0083557.g003 

n\ =n 2 =25. We assumed that X ~ N([i x ,<j 2 x ), Z>, ~N(#i,<5?) and 
S, >~N{\,yf), 1=1,2. To simulate the data from the first 



subsample, we generated n\ =25values from a Bernoulli variable, 
say Q, with the parameter {F + ( l — T — F) W}, where F, T and 
W are known. We, then, generated n\ =25 random values each 
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Figure 4. RE, and RE 2 for {fi 2 x ,a x ) = (1,1), {0\,0%) = (2,3), (d\,5\) = (1 ,1) and = (2,4). 
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on the variables X and D\ from X ~N{u x ,a x ) an d 
Di~N(0i,Sf), respectively. We took Ry = Xj if Q = 0, and 
Ry=Xj + D\j, otherwise. Similarly, «2 = 25 values of R2 from 
the second subsample are generated as: Ry = Xj if 2 = 0, and 
Ry = Xj—D2i, otherwise. Same algorithm is used to generate the 
values of Z'y and Z"y,(i=l,2,.j=l,2,...25). Once the data have 
been generated, different estimators (4z'4(?>Ax/f) are computed 
using the corresponding formulae in (7), (12) and (20). The 
variances of these estimators are obtained using 5000 iterations. 
The relative efficiency results (for the different scenarios given 
below) are given in the Figures 1-4. 

(iii) fl xz versus fi XG and fi XH 

The proposed estimator fi xz is relatively more efficient than 
P-xg an d Uxh if Var{p, XG )>Var{fi xz ) and Var(p XH )> 
Var(p[ xz ). From (9), (14) and (30), we see that it is difficult to 
derive the efficiency conditions for p. xz . We, again, calculated the 
relative efficiency numerically through simulations by defining 

W Var (P-XG) , „, Vm \UxH) w , t , ... 

Rns = -j— 1 — \" an d KL4 = -j— 1 — v " e use d the similar 

*M4z) V^K^xz) 
algorithm to simulate the values of R'y, Z'y and Z'y. It is to be 
noted that we simulated n\ +«2 = 50 values of R'i(i= 1,2.) and 25 
values each of Z\ and Z'/(f'=l,2.). The relative efficiency results 
are given in the Figures 5—8. 

To calculate RE\, RE2RE1 and RE^,we take the following 
different scenarios: 

a. (4,4) = (i,i), (0^) =(2,3), (a?,$=(U), 
(y?,y!)=(2,3) 

b. (4,4) = (1,2), (^^) =(2,3), (^|) = (1,1), 

(yf,rl) = (2,3) 



c. (4,4) = (2,1), $,$=(2,3), (^ 2 2 )=(1,1), 

(y?^) = (2,3) 

d. (4,4) = (1,1), (0^)=(2,3), (^4)=(1,1), 
(y?,yl) = (2,4), 

and study the effect of y\ and y\ on RE\, RE2, RE3 and ^£4. 
The relative efficiencies are calculated for different values of T and 
F over the whole range of W. It is observed that the proposed 
estimator fi xz performs better (in terms of relative efficiency) than 
the 4g an d 4m- Also, the proposed estimator fi xz performs 
relatively better than jj, XG and 4#- It can easily be verified 
through simulations that RE\ , RE2, RE3 and RE4 are indepen- 
dent of «i=«2- To save the space we have not presented the 
graphs for varying values of n\ =«2- From Figures 1-8 following 
observations are made. 

(i) RE\ , RE2 , RE3 and RE4 are not seriously affected by the 
difference between yj and when the other parameters 
are fixed (see Figures 1 and 4 or 5 and 8). 

(ii) RE\, RE2, REt, and ^£4 increase, over the whole range of 
W, with an increase in T when the other parameters, 
except F, are kept fixed (see Figures 1-8). 

(iii) RE\, RE2, RE3 and -R-E4 are not seriously affected by 
change in 4 an d/ or 4 ( see Figures 1 and 3, and 5 and 7 
or 1 and 2 and 5 and 6). 

(iv) Split sample approach is more efficient than double 
response approach 

(v) The proposed estimators of mean through split sample and 
double response approaches do not need a smaller values 
of T irrespective of the sensitivity level W and the forced 
scrambling parameter F. 
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Figure 5. RE 3 and RE 4 for (44) =(1,1), (6»f ,6>|) = (2,3), (<Sj,<5f ) = (1,1) and,(j>f,y|) =(2,3). 
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Figure 6. RE 3 and RE 4 for (^.oj.) =(1,2), (0\,O\) =(2,3), (<5f,<5|) =(1,1) and (j^j-f) = (2,3). 
doi:10.1371/journal.pone.0083557.g006 
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Figure 7. RE 3 and RE 4 for (/»|><4) = C 2 . 1 )' (^A) = ( 2 . 3 )- ( 3 IA) = C 1 . 1 ) and {A>$) = ( 2 . 3 )- 
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Figure 8. RE 3 and RE 4 for (j%,t%) = (1,1), (e\ ,0%) = (2,3), (^,^) = (1,1) and (yf.yf) =(2,4). 
doi:10.1371/journal.pone.0083557.g008 



Conclusion 

To estimate the mean, variance and the sensitivity level of a 
sensitive variable optional randomized response model by Mehta 
et al. [14] is improved. Utilizing the idea of additive scrambling in 
one sample and subtractive scrambling in the other subsample, we 
have proposed unbiased estimators of mean, variance and 
sensitivity level. We compared the proposed procedure with 
Mehta et al. [14] Huang [12], and Gupta et al. [13] procedure. 
The proposed idea resulted in the improved estimation of mean of 
the study variable. It has been shown by Huang [12] that his 
procedure works better than Gupta et al. [4] procedure. 
Therefore, the proposed split sample procedure is also better than 
Gupta et al. [4] procedure both in terms of relative efficiency and 
providing unbiased estimators of the mean fi x , sensitivity level W 
and variance a\ of the study variable. Like Huang [12], the 
proposed procedure has the same advantage of estimating the 
variance of Y with no bias. Unlike Gupta et al. [4], proposed 



procedures do not require larger value of truth parameter (2") 
when the study variable is highly sensitive. This may be considered 
the major advantage of the proposed procedures. It has been 
established that the proposed procedure of estimating mean is 
more efficient than all the procedures considered in this study. 
Moreover, as far as, the estimation of sensitivity is concerned we 
observed that the proposed estimators are less efficient (not shown 
in the figures) than all the estimators considered here except 
Mehta et al. [14]. 

As a final comment, we recommend using proposed procedures 
in the field surveys without increasing sampling cost when 
estimation of mean of the study variable is of prime interest. 
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